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Abstract — We consider an infinite collection of agents who 
make decisions, sequentially, about an unknown underlying 
binary state of the world. Each agent, prior to making a decision, 
receives an independent private signal whose distribution depends 
on the state of the world. Moreover, each agent also observes 
the decisions of its last K immediate predecessors. We study 
conditions under which the agent decisions converge to the 
correct value of the underlying state. 

We focus on the case where the private signals have bounded 
information content and investigate whether learning is possible, 
that is, whether there exist decision rules for the different agents 
that result in the convergence of their sequence of individual 
decisions to the correct state of the world. We first consider 
learning in the almost sure sense and show that it is impossible, 
for any value of K. We then explore the possibility of convergence 
in probability of the decisions to the correct state. Here, a 
distinction arises: if K = 1, learning in probability is impossible 
under any decision rule, while for K > 2, we design a decision 
rule that achieves it. 

We finally consider a new model, involving forward looking 
strategic agents, each of which maximizes the discounted sum 
(over all agents) of the probabilities of a correct decision. 
(The case, studied in previous literature, of myopic agents who 
maximize the probability of their own decision being correct is 
an extreme special case.) We show that for any value of K, for 
any equilibrium of the associated Bayesian game, and under the 
assumption that each private signal has bounded information 
content, learning in probability fails to obtain. 



I. Introduction 

In this paper, we study variations and extensions of a 
model introduced and studied in Cover's seminal work [?]. We 
consider a Bayesian binary hypothesis testing problem over an 
"extended tandem" network architecture whereby each agent 
n makes a binary decision x n , based on an independent private 
signal s„ (with a different distribution under each hypothesis) 
and on the decisions x n ^i, . . . ,x n -K of its K immediate 
predecessors, where if is a positive integer constant. We are 
interested in the question of whether learning is achieved, that 
is, whether the sequence {x n } correctly identifies the true hy- 
pothesis (the "state of the world," to be denoted by 9), almost 
surely or in probability, as n —> oo. For K = 1, this coincides 
with the model introduced by Cover [?] under a somewhat 
different interpretation, in terms of a single memory-limited 
agent who acts repeatedly but can only remember its last 
decision. 

At a broader, more abstract level, our work is meant to 
shed light on the question whether distributed information held 
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by a large number of agents can be successfully aggregated 
in a decentralized and bandwidth-limited manner. Consider a 
situation where each of a large number of agents has a noisy 
signal about an unknown underlying state of the world 6. 
This state of the world may represent an unknown parameter 
monitored by decentralized sensors, the quality of a product, 
the applicability of a therapy, etc. If the individual signals 
are independent and the number of agents is large, collecting 
these signals at a central processing unit would be sufficient for 
inferring ("learning") the underlying state 8. However, because 
of communication or memory constraints, such centralized 
processing may be impossible or impractical. It then becomes 
of interest to inquire whether 6 can be learned under a 
decentralized mechanism where each agent communicates a 
finite-valued summary of its information (e.g., a purchase or 
voting decision, a comment on the success or failure of a 
therapy, etc.) to a subset of the other agents, who then refine 
their own information about the unknown state. 

Whether learning will be achieved under the model that we 
study depends on various factors, such as the ones discussed 
next : 

(a) As demonstrated in [?], the situation is qualitatively 
different depending on certain assumptions on the in- 
formation content of individual signals. We will focus 
exclusively on the case where each signal has bounded 
information content, in the sense that the likelihood ratio 
associated with a signal is bounded away from zero 
and infinity — the so called Bounded Likelihood Ratio 
(BLR) assumption. The reason for our focus is that in 
the opposite case (of unbounded likelihood ratios), the 
learning problem is much easier; indeed, [?] shows that 
almost sure learning is possible, even if K = 1. 

(b) An aspect that has been little explored in the prior 
literature is the distinction between different learning 
modes, learning almost surely or in probability. We will 
see that the results can be different for these two modes. 

(c) The results of [?] suggest that there may be a qualitative 
difference depending on the value of K. Our work will 
shed light on this dependence. 

(d) Whether learning will be achieved or not, depends on the 
way that agents make their decisions x n . In an engineer- 
ing setting, one can assume that the agents' decision rules 
are chosen (through an offline centralized process) by a 
system designer. In contrast, in game-theoretic models, 
each agent is assumed to be a Bayesian maximizer of 
an individual objective, based on the available informa- 
tion. Our work will shed light on this dichotomy by 
considering a special class of individual objectives that 
incorporate a certain degree of altruism. 
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A. Summary of the paper and its contributions 

We provide here a summary of our main results, together 
with comments on their relation to prior works. In what 
follows, we use the term decision rule to refer to the mapping 
from an agent's information to its decision and the term 
decision profile to refer to the collection of the agents' decision 
rules. Unless there is a statement to the contrary, all results 
mentioned below are derived under the BLR assumption. 

(a) Almost sure learning is impossible (Theorem^. For any 
K > 1, we prove that there exists no decision profile that 
guarantees almost sure convergence of the sequence {x n } 
of decisions to the state of the world 0. This provides 
an interesting contrast with the case where the BLR 
assumption does not hold; in the latter case, almost sure 
learning is actually possible [?]. 

(b) Learning in probability is impossible if K = 1 (Theorem 
[2). This strengthens a result of Koplowitz [?] who showed 
the impossibility of learning in probability for the case 
where K = 1 and the private signals s n are i.i.d. 
Bernoulli random variables. 

(c) Learning in probability is possible if K > 2 (Theorem^. 
For the case where K > 2, we provide a fairly elaborate 
decision profile that yields learning in probability. This 
result (as well as the decision profile that we construct) is 
inspired by the positive results in [?] and [?], according to 
which, learning in probability (in a slightly different sense 
from ours) is possible if each agent can send 4-valued 
or 3-valued messages, respectively, to its successor. In 
more detail, our construction (when K = 2) exploits the 
similarity between the case of a 4-valued message from 
the immediate predecessor (as in [?]) and the case of 
binary messages from the last two predecessors: indeed, 
the decision rules of two predecessors can be designed 
so that their two binary messages convey (in some sense) 
information comparable to that in a 4-valued message 
by a single predecessor. Still, our argument is somewhat 
more complicated than the ones in [?] and [?], because 
in our case, the actions of the two predecessors cannot 
be treated as arbitrary codewords: they must obey the 
additional requirement that they equal the correct state of 
the world with high probability. 

(d) No learning by forward looking, altruistic agents (Theo- 
rem [?]). As already discussed, when K > 2, learning is 
possible, using a suitably designed decision profile. On 
the other hand, if each agent acts myopically (i.e., maxi- 
mizes the probability that its own decision is correct), it 
is known that learning will not take place ([?], [?], [?]). 
To further understand the impact of selfish behavior, we 
consider a variation where each agent is forward looking, 
in an altruistic manner: rather than being myopic, each 
agent takes into account the impact of its decisions on 
the error probabilities of future agents. This case can 
be thought of as an intermediate one, where each agent 
makes a decision that optimizes its own utility function 
(similar to the myopic case), but the utility function 
incentivizes the agent to act in a way that corresponds 
to good systemwide performance (similar to the case 



of centralized design). In this formulation, the optimal 
decision rule of each agent depends on the decision rules 
of all other agents (both predecessors and successors), 
which leads to a game-theoretic formulation and a study 
of the associated equilibria. Our main result shows that 
under any (suitably defined) equilibrium, learning in 
probability fails to obtain. In this sense, the forward look- 
ing, altruistic setting falls closer to the myopic rather than 
the engineering design version of the problem. Another 
interpretation of the result is that the carefully designed 
decision profile that can achieve learning will not emerge 
through the incentives provided by the altruistic model; 
this is not surprising because the designed decision profile 
is quite complicated. 

B. Outline of the paper 

The rest of the paper is organized as follows. In Section 
[III we review some of the related literature. In Section [EIIJ we 
provide a description of our model, notation, and terminology. 
In Section[rV] we show that almost sure learning is impossible. 
In Section |V] (respectively, Section [VT|) we show that learning 
in probability is impossible when K = 1 (respectively, 
possible when K > 2). In Section VII we describe the 



model of forward looking agents and prove the impossibility 
of learning. We conclude with some brief comments in Section 

II. Related literature 

The literature on information aggregation in decentralized 
systems is vast; we will restrict ourselves to the discussion of 
models that involve a Bayesian formulation and are somewhat 
related to our work. The literature consists of two main 
branches, in statistics/engineering and in economics. 

A. Statistics/engineering literature 

A basic version of the model that we consider was studied 
in the two seminal papers [?] and [?], and which have already 
been discussed in the Introduction. The same model was also 
studied in [?], which gave a characterization of the minimum 
probability of error, when all agents decide according to the 
same decision rule. The case of myopic agents and K = 1 
was briefly discussed in [?] who argued that learning (in 
probability) fails to obtain. A proof of this negative result 
was also given in [?], together with the additional result 
that myopic decision rules will lead to learning if the BLR 
assumption is relaxed. Finally, [?] studies myopic decisions 
based on private signals and observation of ternary messages 
from a predecessor in a tandem configuration. 

Another class of decentralized information fusion problems 
was introduced in [?]. In that work, there are again two 
hypotheses on the state of the world and each one of a set 
of agents receives a noisy signal regarding the true state. Each 
agent summarizes its information in a finitely-valued message 
which it sends to a fusion center. The fusion center solves a 
classical hypothesis testing problem (based on the messages 
it has received) and decides on one of the two hypotheses. 
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The problem is the design of decision rules for each agent 
so as to minimize the probability of error at the fusion 
center. A more general network structure, in which each agent 
observes messages from a specific set of agents before making 
a decision was introduced in [?] and [?], under the assumption 
that the topology that describes the message flow is a directed 
tree. In all of this literature (and under the assumption that the 
private signals are conditionally independent, given the true 
hypothesis) each agent's decision rule should be a likelihood 
ratio test, parameterized by a scalar threshold. However, in 
general, the problem of optimizing the agent thresholds is 
a difficult nonconvex optimization problem — see [?] for a 
survey. 

In the line of work initiated in [?], the focus is often on 
tree architectures with large branching factors, so that the 
probability of error decreases exponentially in the number of 
sensors. In contrast, for tandem architectures, as in [?], [?], [?], 
[?], and for the related ones considered in this paper, learning 
often fails to hold or takes place at a slow, subexponential 
rate [?]. The focus of our paper is on this latter class of 
architectures and the conditions under which learning takes 
place. 

B. Economics literature 

A number of papers, starting with [?] and [?], study learning 
in a setting where each agent, prior to making a decision, 
observes the history of decisions by all of its predecessors. 
Each agent is a Bayesian maximizer of the probability that 
its decision is correct. The main finding is the emergence 
of "herds" or " informational cascades," where agents copy 
possibly incorrect decisions of their predecessors and ignore 
their own information, a phenomenon consistent with that 
discussed by Cover [?] for the tandem model with K = 1. The 
most complete analysis of this framework (i.e., with complete 
sharing of past decisions) is provided in [?], which also draws 
a distinction between the cases where the BLR assumption 
holds or fails to hold, and establishes results of the same flavor 
as those in [?]. 

A broader class of observation structures is studied in [?] 
and [?], with each agent observing an unordered sample of 
decisions drawn from the past, namely, the number of sampled 
predecessors who have taken each of the two actions. The 
most comprehensive analysis of this setting, where agents are 
Bayesian but do not observe the full history of past decisions, 
is provided in [?]. This paper considers agents who observe 
the decisions of a stochastically generated set of predecessors 
and provides conditions on the private signals and the network 
structure under which asymptotic learning (in probability) to 
the true state of the world is achieved. 

To the best of our knowledge, the first paper that studies 
forward looking agents is [?]: each agent minimizes the 
discounted sum of error probabilities of all subsequent agents, 
including their own. This reference considers the case where 
the full past history is observed and shows that herding on an 
incorrect decision is possible, with positive probability. (On 
the other hand, learning is possible if the BLR assumption is 
relaxed.) Finally, [?] considers a similar model and explicitly 




Fig. 1: The observation model. If the unknown state of the 
world is 8 = j, j 6 {0, 1}, the agents receive independent 
private signals s n drawn from a distribution Fj, and also 
observe the decisions of the K immediate predecessors. In 
this figure, K = 2. If agent n observes the decision of agent 
k, we draw an arrow pointing from k to n. 

characterizes a simple and tractable equilibrium that generates 
a herd, showing again that even with payoff interdependence 
and forward looking incentives, payoff-maximizing agents 
who observe past decisions can fail to properly aggregate the 
available information. 

III. The Model and Preliminaries 

In this section we present the observation model (illustrated 
in Figure [TJ and introduce our basic terminology and notation. 

A. The observation model 

We consider an infinite sequence of agents, indexed by 
n E N, where N is the set of natural numbers. There is an 
underlying state of the world 8 € {0, 1}, which is modeled 
as a random variable whose value is unknown by the agents. To 
simplify notation, we assume that both of the underlying states 
are a priori equally likely, that is, P(0 = 0) = P(0 = 1) = 1/2. 

Each agent n forms posterior beliefs about this state based 
on a private signal that takes values in a set S, and also 
by observing the decisions of its K immediate predecessors. 
We denote by s n the random variable representing agent n's 
private signal, while we use s to denote specific values in S. 
Conditional on the state of the world 8 being equal to zero 
(respectively, one), the private signals are independent random 
variables distributed according to a probability measure Fo 
(respectively, Fj.) on the set S. Throughout the paper, the 
following two assumptions will always remain in effect. First, 
Fo and Fi are absolutely continuous with respect to each other, 
implying that no signal value can be fully revealing about the 
correct state. Second, Fo and Fi are not identical, so that the 
private signals can be informative. 

Each agent n is to make a decision, denoted by x n , which 
takes values in {0, 1}. The information available to agent n 
consists of its private signal s„ and the random vector 

Vn y^n— Ki • • • i ^n— l)- 

of decisions of its K immediate predecessors. (For notational 
convenience an agent i with index i < is identified with 
agent 1.) The decision x n is made according to a decision 
rule d n : {0, 1} K xS^{0, 1}: 
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A decision profile is a sequence d = {d n }neN of decision 
rules. Given a decision profile d, the sequence x = {a:„} ra gN of 
agent decisions is a well defined stochastic process, described 
by a probability measure to be denoted by P^, or simply by P 
if d has been fixed. For notational convenience, we also use 
P J '(-) to denote the conditional measure under the state of the 
world j, that is 



\e = j). 



It is also useful to consider randomized decision rules, 
whereby the decision x n is determined according to x n = 
d n (z n , v„, s n ), where z n is an exogenous random variable 
which is independent for different n and also independent of 
9 and (v n ,s„). (The construction In Section VI will involve 
a randomized decision rule.) 

B. An assumption and the definition of learning 

As mentioned in the Introduction, we focus on the case 
where every possible private signal value has bounded infor- 
mation content. The assumption that follows will remain in 
effect throughout the paper and will not be stated explicitly in 
our results. 

Assumption 1. (Bounded Likelihood Ratios — BLR). There 
exist some m > and M < oo, such that the Radon-Nikodym 
derivative d¥ Q /d¥i satisfies 

m<—(s)<M, 

for almost all s G S under the measure (¥q +Fj.)/2 

We study two different types of learning. As will be seen 
in the sequel, the results for these two types are, in general, 
different. 

Definition 1. We say that a decision profile d achieves almost 
sure learning if 

lim x n — 8, ¥ 'd- almost surely, 

n— >oo 

and that it achieves learning in probability if 

lim ¥ d (x n = 6) = 1. 

n— > oo 

IV. Impossibility of almost sure learning 

In this section, we show that almost sure learning is impos- 
sible, for any value of K. 

Theorem 1. For any given number K of observed immediate 
predecessors, there exists no decision profile that achieves 
almost sure learning. 

The rest of this section is devoted to the proof of Theorem[T] 
We note that the proof does not use anywhere the fact that 
each agents only observes the last K immediate predecessors. 
The exact same proof establishes the impossibility of almost 
sure learning even for a more general model where each 
agent n observes the decisions of an arbitrary subset of its 
predecessors. Furthermore, while the proof is given for the 
case of deterministic decision rules, the reader can verify that 



it also applies to the case where randomized decision rules are 
allowed. 

The following lemma is a simple consequence of the BLR 
assumption and its proof is omitted. 

Lemma 1. For any u € {0, 1} K and any j € {0, 1}, we have 

m ■ ¥\x n = j | v„ = u) < ¥°(x n = j | v n = u) 

< M-¥ 1 (x n =j | v„ = u), (1) 

where m and M are as in Definition [7] 

Lemma [T] states that (under the BLR assumption) if under 
one state of the world some agent n, after observing u, decides 
with positive probability, then the same must be true with 
proportional probability under the other state of the world. This 
proportional dependence of decision probabilities for the two 
possible underlying states is central to the proof of Theorem 1. 

Before proceeding with the main part of the proof, we need 
two more lemmata. Consider a probability space (f2, J 7 , P) and 

a sequence of events k = 1, 2, The upper limiting 

set of the sequence, lim sup ^.^^ Ek, is defined by 

limsup.E fe = rdUfetn^fc- 

k— J-oo 

(This is the event that infinitely many of the occur.) We 
will use a variation of the Borel-Cantelli lemma (Corollary 6. 1 
in [?]) that does not require independence of events. 



Lemma 2. If 



oo 

E 

k=l 



P(E k \E' 1 ...E' k _ 1 ) = oo, 



then, 



lim sup E^ 

k— >oo 



1. 



where E' k denotes the complement of E k . 

Finally, we will use the following algebraic fact. 

Lemma 3. Consider a sequence {<7„} ne N of real numbers, 
with q n £ [0, 1], for all n£R Then, 

n£V neV 

for any V C N. 

Proof: The second inequality is standard. For the first one, 
interpret the numbers {q ra }ngN as probabilities of independent 
events {E n } n ^. Then, clearly, 



Observe that 



n£V 



and by the union bound, 

nUneV E n) < 



Combining the above yields the desired result. 

We are now ready to prove the main result of this section. 
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Proof of Theorem [TJ Let U denote the set of all binary 
sequences with a finite number of zeros (equivalently, the 
set of binary sequences that converge to one). Suppose, to 
derive a contradiction, that we have almost sure learning. Then, 
¥ 1 (x £ U) = 1. The set U is easily seen to be countable, 
which implies that there exists an infinite binary sequence 
u = {u n }rieN such that P x (x = u) > 0. In particular, 

P^Xfe = u k , for all k < n) > 0, for all n £N. 

Since (xi,X2, ■ ■ • , x n ) is determined by (s\, S2, ■ ■ ■ , s n ) and 
since the distributions of (sj., S2, . . . , s n ) under the two hy- 
potheses are absolutely continuous with respect to each other, 
it follows that 

P°{x k = u k , for all k < n) > 0, for all n £ N. (2) 
We define 

<z° = P°(ai„ 7^ u„ | x fc = u fc , for all k < n), 
a„ = P 1 (x n ^ u n | x k = Uk, for all k < n). 
Lemma [JJ implies that 

mal <<£<Ma n , (3) 



because for j £ {0, 1}, P J (x n ^ u n \ x k = u k , for all fc < 
n) = P J (x n 7^ u n | x fe = u fe , for k = n - K, . . . , n - 1). 



Suppose that 



oo 
n=l 



Then, Lemma |5j with the identification = {x k ^ u k }, 
implies that the event {x k ^ u k , for some k} has probability 
1, under P 1 . Therefore, P : (x = u) = 0, which contradicts the 
definition of u. 



Suppose now that J^^Li a n < 00 ■ Then, 

oo oo 

X a ° < M ■ X a « < 



n=l 



n=l 



and 



lim P° (x„ 7^ u n \ x k — u k . for all fc < n) 

oo 

= lim V a" = 0. 



Choose some iV such that 

oo _^ 

^ P°{x n 7^ "n I a;*; = Uk, for all k < n) < -. 



Then, 



l0 (x = u) = P°(> fe = u k , for all k < N) 

JJ (1 - P°(x„ 7^ u„ | a; fc = uk, for all fe < n)). 



The first term on the right-hand side is positive by |2]), while 

oo 

Y[ (1 - P°(a;„ 7^ m» I ^fe = «fc, for all k < n)) 

n=N 

> 1 - ^ P°(a;„ 7^ u n | x fc = M fe , for all k < n) > -. 

n=N 

Combining the above, we obtain P°(x = u) > and 
liminf P°(a;„ = 1) > P°(x = u) > 0, 

n— J-oo 

which contradicts almost sure learning and completes the 
proof. ■ 
Given Theorem [TJ in the rest of the paper we concentrate 
exclusively on the weaker notion of learning in probability, as 
defined in Section Ull-B I 

V. NO LEARNING IN PROBABILITY WHEN K = 1 

In this section, we consider the case where K = 1, so 
that each agent only observes the decision of its immediate 
predecessor. Our main result, stated next, shows that learning 
in probability is not possible. 

Theorem 2. If K — 1, there exists no decision profile that 
achieves learning in probability. 

We fix a decision profile and use a Markov chain to repre- 
sent the evolution of the decision process under a particular 
state of the world. In particular, we consider a two-state 
Markov chain whose state is the observed decision x n -\. 
A transition from state i to state j for the Markov chain 
associated with 6 = 1, where £ {0,1}, corresponds 

to agent n taking the decision j given that its immediate 
predecessor n—1 decided i, under the state 9 = 1, The Markov 
property is satisfied because the decision x n , conditional on 
the immediate predecessor's decision, is determined by s n 
and hence is (conditionally) independent from the history of 
previous decisions. Since a decision profile d is fixed, we can 
again suppress d from our notation and define the transition 
probabilities of the two chains by 

< = P°(z„=j|x n _i=£) (4) 
a« = P 1 {x n =j\x n - 1 =i), (5) 

where i,j £ {0, 1}. The two chains are illustrated in Fig. [2] 
Note that in the current context, and similar to Lemma [TJ the 
BLR assumption yields the inequalities 



m ■ aS <a l J <M ■ ai J , 



(6) 



-N 



where i,j £ {0,1}, and m > 0, M < oo, are as in 
Definition [TJ 

We now establish a further relation between the transition 
probabilities under the two states of the world. 

Lemma 4. If we have learning in probability, then 

oo 

71=1 



6 




«10 




= 



Fig. 2: The Markov chains that model the decision process 
for K = 1. States represent observed decisions. The transition 
probabilities under 9 = or 9 = 1 are given by 
and ajf, respectively. If learning in probability is to occur, 
the probability mass needs to become concentrated on the 
highlighted state. 



and 



E< 



,10 



(8) 



Proof: For the sake of contradiction, assume that 
Yln=i a °n < 00 ■ By Eq. [6] we also have X^^Li < 00 ■ 
Then, the expected number of transitions from state to state 
1 is finite under either state of the world. In particular the 
(random) number of such transitions is finite, almost surely. 
This can only happen if {xn}^^ converges almost surely. 
However, almost sure convergence together with learning in 
probability would imply almost sure learning, which would 
contradict Theorem [T] The proof of the second statement in 
the lemma is similar. ■ 
The next lemma states that if we have learning in proba- 
bility, then the transition probabilities between different states 
should converge to zero. 



Lemma 5. If we have learning in probability, then 



lim a„ x = 0. 

n— >oo 



Proof: Assume, to arrive at a contradiction that there 
exists some e G (0,1) such that 

= ¥°(x n = 1 | x n -i = 0) > e, 

for infinitely many values of n. Since we have learning in 
probability, we also have P°(a; n _i = 0) > 1/2 when n is 
large enough. This implies that for infinitely many values of 

n, 



P°(Z„ = 1) > V°(x n = 1 | - 0)P°(_ r , 



o)> 



But this contradicts learning in probability. ■ 
We are now ready to complete the proof of Theorem [2] 
by arguing as follows. Since the transition probabilities from 
state to state 1 converge to zero, while their sum is infinite, 
under either state of the world, we can divide the agents (time) 
into blocks so that the corresponding sums of the transition 
probabilities from state to state 1 over each block are 
approximately constant. If during such a block the sum of 
the transition probabilities from state 1 to state is large, then 



under the state of the world 9=1, there is high probability 
of starting the block at state 1, moving to state 0, and staying 
at state until the end of the block. If on the other hand 
the sum of the transition probabilities from state 1 to state 
is small, then under state of the world 9 = 0, there is high 
probability of starting the block at state 0, moving to state 1, 
and staying at state 1 until the end of the block. Both cases 
prevent convergence in probability to the correct decision. 

Proof of Theorem |2j We assume that we have learning 
in probability and will derive a contradiction. From Lemma [5] 
lim.n_j.oo a^ 1 = and therefore there exists a N £ N such that 
for all n> N, 



a°J< 



m 



(10) 



Moreover, by the learning in probability assumption, there 
exists some N £ N such that for all n > N, 

1 



and 



'°(x n = 0) > 
> 1 (x n = l) > 



(11) 
(12) 



10) 



12 1 all hold for n > 




Let N = max{iV, N} so that Eqs. 

N, 



We divide the agents (time) into blocks so that in each block 
the sum of the transition probabilities from state to state 1 
can be simultaneously bounded from above and below. We 
define the last agents of each block recursively, as follows: 

n = N, 



From Lemma |4j we have that Y^=n a n 1 = 00 ■ This fact, 
together with Eq. (JTOj, guarantees that the sequence rj, is well 
defined and strictly increasing. 

Let Ak be the block that ends with agent r^+i, i.e., Ak = 
{rfc + 1, . . . , The construction of the sequence {r^jfegN 

yields 

E > ?• 

neA k 

On the other hand, rk+i is the first agent for which the sum 
is at least m/2 and since, by (lOi, a rk+1 < m/6, we get that 

m 2m 



< 



n£A k 



G 



Thus, 



< 



E< 



< 



2m 



n£A k 



and combining with Eq. d5l, we also have 

TTX ^ — > _ni 2 

2M - ^ 0,1 - 3' 



(13) 



(14) 



for all k. 

We consider two cases for the sum of transition probabilities 
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from state 1 to state during block Ak- We first assume that 

neA k 

Using Eq. {51, we obtain 

V a 10 > V — • a 10 > — 



neA k 



neA k 



(15) 



The probability of a transition from state 1 to state during 
the block Ah, under 8 = 1, is 

p 1 (IUa^K = o} I * rjb = i) = i - n (i - 4°) 

n€A)= 

Using Eq. (JT3J and Lemma [3] the product on the right-hand 
side can be bounded from above, 

nGA k 

which yields 

P 1 (U ne A,K = 0}|^ = l)>l-e- 1 /( 2 ^). 

After a transition to state occurs, the probability of staying 
at that state until the end of the block is bounded below as 
follows: 

p 1 k +1 = o i iui fc K = °}) > n ( j - 



The right-hand side can be further bounded using Eq. ( 14 1 and 
Lemma [3] as follows: 

na-0>i - E^>^ 

Combining the above and using ( p~2] >, we conclude that 

P 1 (Xr fc+l = 0) ^P 1 ^ = | \J neAk {x n = 0}) 

■V 1 (U neAk {x n = 0}\x rk = l)¥ 1 (x rk =l) 
>-■ (l- e _1/(2M)N ) • -. 

We now consider the second case and assume that 

£4° 4 

The probability of a transition from state to state 1 during 
the block A^ is 

P° (lW>n = 1} I a: rjb = 0) = 1 - I] (! - <)• 

The product on the right-hand side can be bounded above 
using Lemma [3] 

which yields 

p° (U„ e A t K-i}l^=o)>i- e - m / 2 . 

After a transition to state 1 occurs, the probability of staying 
at that state until the end of the block is bounded from below 



as follows: 

P° (x 



1 = l|U n6 A>» = l})> L[( 1 - a « )- 

n£A k 



The right-hand side can be bounded using Eq. 
Lemma [3] as follows: 

1 



14 1 and 



IK* 



> 1 



neA k 



neA k 



Using also Eq. (JTTJ, we conclude that 

P°K, +I = 1) >P°(x,. fe+1 = 1 I UneA^^n = 1}) 

■V°(U neAk {Xn = l}\Xr k =0)W°(x rk =0) 

Combining the two cases we conclude that 

liminfP d (x„ £9) (16) 



1 . 

> - mm 
~ 2 



1 - e 



-1/(2M) 



1 - e 



-m/2 



> 



which contradicts learning in probability and concludes the 
proof. ■ 
Once more, we note that the proof and the result remain 
valid for the case where randomized decision rules are al- 
lowed. 

The coupling between the Markov chains associated with 
the two states of the world is central to the proof of Theorem[2] 
The importance of the BLR assumption is highlighted by the 
observation that if either m = or M = 00, then the lower 



bound obtained in (I61 is zero, and the proof fails. The next 
section shows that a similar argument cannot be made to work 
when K > 2. In particular, we construct a decision profile that 
achieves learning in probability when agents observe the last 
two immediate predecessors. 

VI. Learning in probability when K > 2 

In this section we show that learning in probability is 
possible when K > 2, i.e., when each agent observes the 
decisions of two or more of its immediate predecessors. 

A. Reduction to the case of binary observations 

We will construct a decision profile that leads to learning in 
probability, for the special case where the signals s„ are binary 
(Bernoulli) random variables with a different parameter under 
each state of the world. This readily leads to a decision profile 
that learns, for the case of general signals. Indeed, if the s n are 
general random variables, each agent can quantize its signal, 
to obtain a quantized signal s' n — h(s n ) that takes values 
in {0,1}. Then, the agents can apply the decision profile for 
the binary case. The only requirement is that the distribution 
of s' n be different under the two states of the world. This is 
straightforward to enforce by proper choice of the quantization 
rule h: for example, we may let h(s n ) = 1 if and only if 
P(0 = 1 I s n ) > P(0 = I s n ). It is not hard to verify 
that with this construction and under our assumption that the 
distributions Fo and Fi are not identical, the distributions of 
s' under the two states of the world will be different. 
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We also note that it suffices to construct a decision profile 
for the case where K — 2. Indeed, if K > 2, we can have 
the agents ignore the actions of all but their two immediate 
predecessors and employ the decision profile designed for the 
case where K = 2. 

B. The decision profile 

As just discussed, we assume that the signal s n is binary. 
For i = 0,1, we let p L = P l (s n = 1) and g< = 1 — pi. 
We also use p to denote a random variable that is equal to 
Pi if and only if 6 = i. Finally, we let p = (po +pi)/2 
and q = 1 — p = (qo + qi)/2. We assume, without loss of 
generality, that po < pi, in which case we have po <p<pi 
and q >q> q\. 

Let {fc m } m gN and {r m } me pj be two sequences of positive 
integers that we will define later in this section. We divide the 
agents into segments that consist of S-blocks, R-blocks, and 
transient agents, as follows. We do not assign the first two 
agents to any segment (and the first segment starts with agent 
n = 3). For segment m £ N: 

(i) the first 2k m — 1 agents belong to the block S m ; 

(ii) the next agent is an SR transient agent; 

(iii) the next 2r m — 1 agents belong to the block R m ; 

(iv) the next agent is an RS transient agent. 

An agent's information consists of the last two decisions, 
denoted by v n = (x n ~2, Xn-i), and its own signal s n . The 
decision profile is constructed so as to enforce that if n is the 
first agent of either an S or R block, then v n = (0, 0) or (1, 1). 

(i) Agents 1 and 2 choose 0, irrespective of their private 
signal. 

(ii) During block S m , for m > 1: 

a) If the first agent of the block, denoted by n, observes 
(1, 1), it chooses 1, irrespective of its private signal. 
If it observes (0, 0) and its private signal is 1, then 

where z n is an independent Bernoulli random variable 
with parameter 1/m. If z n = 1 we say that a 
searching phase is initiated. (The cases of observing 
(1,0) or (1,0) will not be allowed to occur.) 

b) For the remaining agents in the block: 

i) Agents who observe (0, 1) decide for all private 
signals. 

ii) Agents who observe (1,0) decide 1 if and only if 
their private signal is 1. 

iii) Agents who observe (0, 0) decide for all private 
signals. 

iv) Agents who observe (1,1) decide 1 for all private 
signals. 

(iii) During block R m : 

a) If the first agent of the block, denoted by n, observes 
(0, 0), it chooses 0, irrespective of its private signal. 
If it observes (1,1) and its private signal is 0, then 

x n 1 z n: 

where z n is a Bernoulli random variable with parame- 
ter 1/m. If z n — 1, we say that a searching phase is 



s n = or s n = 1 




(a) The decision rule for the first (b) The decision rule for all agents 
agent of block S m ■ of block S m but the first. 

Fig. 3: Illustration of the decision profile during block S m . 
Here, z n is a Bernoulli random variable, independent from s n 
or v n , which takes the value z n = 1 with a small probability 
1/m. In this figure, the state represents the decisions of the 
last two agents and the decision rule dictates the probabilities 
of transition between states. 



initiated. (The cases of observing (1,0) or (0, 1) will 
not be allowed to occur.) 
b) For the remaining agents in the block: 

i) Agents who observe (1, 0) decide 1 for all private 
signals. 

ii) Agents who observe (0, 1) decide if and only if 
their private signal is 0. 

iii) Agents who observe (0, 0) decide for all private 
signals. 

iv) Agents who observe (1, 1) decide 1 for all private 
signals. 

(iv) An SR or RS transient agent n sets x n = x n -i, 
irrespective of its private signal. 
We now discuss the evolution of the decisions (see also 
Figure [3] for an illustration of the different transitions). We 
first note that because V3 ~ (2^1,^2) = (0,0) and because 
of the rules for transient agents, our requirement that v„ be 
either (0, 0) or (1, 1) when n lies at the beginning of a block, is 
automatically satisfied. Next, we discuss the possible evolution 
of v„ in the course of a block S m . (The case of a block R m 
is entirely symmetrical.) Let n be the first agent of the block, 
and note that the last agent of the block is n + 2k m — 2. 

1) If v„ = (1, 1), then Vi = (1, 1) for all agents i in the 
block, as well as for the subsequent SR transient agent, 
which is agent n + 2k m ~ 1. The latter agent also decides 
1, so that the first agent of the next block, R m , observes 

V„+2fe m = (1, 1). 

2) If v„ = (0, 0) and x n = 0, then = (0, 0) for all agents 
i in the block, as well as for the subsequent SR transient 
agent, which is agent n + 2k m — 1. The latter agent also 
decides 0, so that the first agent of the next block, R m , 
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observes v„ +2fcm = (0,0). 
3) The interesting case occurs when v„ = (0, 0), s n = 1, 
and z n = 1, so that a search phase is initiated and x n = 1, 
v„+i = (0, 1), x n+x = 0, v n+2 = (1,0). Here there are 
two possibilities: 

a) Suppose that for every i > n in the block S m , for 
which i— n is even (and with i not the last agent in the 
block), we have s, = 1, Then, for i — n even, we will 
have Vj = (1,0), x< = 1, v 4+1 = (0,1), = 0, 
v; + 2 = (1,0), etc. When i is the last agent of the 
block, then i = n + 2k m — 2, so that i — n is even, 
Vi = (1, 0), and Xi = 1. The subsequent SR transient 
agent, agent n + 2k m — 1, sets a; n +2fc m -i = L so 
that the first agent of the next block, Ri, observes 

V„ +2 fc TO = (1,1). 

b) Suppose that for some i > n in the block S m , for 
which i — n is even, we have Sj = 0. Let i be the 
first agent in the block with this property. We have 
Vj = (1,0) (as in the previous case), but Xi = 0, 
so that Vj +1 = (0,0). Then, all subsequent decisions 
in the block, as well as by the next SR transient 
agent are 0, and the first agent of the next block, 
R m , observes v„ +2fem = (0,0). 

To understand the overall effect of our construction, we 
consider a (non-homogeneous) Markov chain representation 
of the evolution of decisions. We focus on the subsequence of 
agents consisting of the first agent of each S- and R-block. By 
the construction of the decision profile, the state v n , restricted 
to this subsequence, can only take values (0, 0) or (1, 1), and 
its evolution can be represented by a 2-state Markov chain. 
The transition probabilities between the states in this Markov 
chain is given by a product of terms, the number of which is 
related to the size of the S- and R-blocks. For learning to occur, 
there has to be an infinite number of switches between the two 
states in the Markov chain (otherwise getting trapped in an 
incorrect decision would have positive probability). Moreover, 
the probability of these switches should go to zero (otherwise 
there would be a probability of switching to the incorrect 
decision that is bounded away from zero). We obtain these 
features by allowing switches from state (0,0) to state (1, 1) 
during S-blocks and switches from state (1,1) to state (0,0) 
during R-blocks. By suitably defining blocks of increasing 
size, we can ensure that the probabilities of such switches 
remain positive but decay at a desired rate. This will be 
accomplished by the parameter choices described next. 

Let log(-) stand for the natural logarithm. For m large 
enough so that logm is larger than both 1/p and 1/q, we 
let 

(17) 



and 



!ogi/p (log m) 



logi/g (logm) 



(18) 



both of which are positive numbers. Otherwise, for small m, 
we let k m = r m = 1. These choices guarantee learning. 

Theorem 3. Under the decision profile and the parameter 



choices described in this section, 

lim W(x n =0) = 1. 

n—toc 

C. Proof of Theorem [3] 

The proof relies on the following fact. 
Lemma 6. Fix an integer L > 2. If a > 1, then the series 

51 m W a f 



m—L 



m log" (m) ' 



converges; if a < 1, then the series diverges. 

Proof: See Theorem 3.29 of [?]. ■ 
The next lemma characterizes the transition probabilities of 
the non-homogeneous Markov chain associated with the state 
of the first agent of each block. For any m € N, let wi m -\ be 
the decision of the last agent before block S m , and let W2m be 
the decision of the last agent before block R m . Note that for 
m = 1, W2m-i = i«i is the decision X2 — 0, since the first 
agent of block S\ is agent 3. More generally, when i is odd 
(respectively, even), w, describes the state at the beginning 
of an S-block (respectively, R-block), and in particular, the 
decision of the transient agent preceding the block. 



Lemma 7. We have 



(w i+1 = l\ w t = 0)-- 



m(i) 



, ;/ i is odd, 
otherwise, 



and 



(w t+1 =0\ Wi = i)={ m(i) 



, if i is even, 
otherwise, 



where 



m{i) 



(i + l)/2, if i is odd, 



i/2, 



if i is even. 



(The above conditional probabilities are taken under either 
state of the world 9, with the parameters p and q on the right- 
hand side being the corresponding probabilities that s n = 1 
and s n = 0.) 

Proof: Note that m(i) is defined so that Wi is associated 
with the beginning of either block S m u\ or R m u), depending 
on whether i is odd or even, respectively. 

Suppose that i is odd, so that we are dealing with the 
beginning of an S-block. If Wi — 1, then, as discussed in 
the previous subsection, we will have Wi = 1, which proves 
that P(w 4+ i = 0\wi= 1)=0. 

Suppose now that i is odd and Wi — 0. In this case, there 
exists only one particular sequence of events under which the 
state will change to Wi+i = 1. Specifically, the searching 
phase should be initiated (which happens with probability 
l/m(i)), and the private signals of about half of the agents 
in the block S m ^ (k m ^ of them) should be equal to 1. The 
probability of this sequence of events is precisely the one given 
in the statement of the lemma. 
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The transition probabilities for the case where i is even are 
obtained by a symmetrical argument. ■ 

The reason behind our definition of k m and r m is that we 
wanted to enforce Eqs. (fT9l>-(|20l> in the lemma that follows. 



Lemma 8. We have 



and 



OO 

5>l m TO 

m— 1 
oo ^ 



OO ^ 



< oo, 



m.— l 



< OO, 



oo _^ 



(19) 



(20) 



that 



Proof: For to large enough, the definition of k m implies 

logp ( ) < Kn < logp ( t^— J + 1. 

y \ log to J v \ log to / 



or equivalently, 



< V m < P 



where p stands for either po or p\. (Note that the direction 
of the inequalities was reseversed because the base p of the 
logarithms is less than 1.) Dividing by to, using the identity 
p = p lo %p(p\ after some elementary manipulations, we obtain 



1 



P 



to loe 



< p" 



1 1 

— < 

TO TO log' 



TO 



where a = logp-(p). By a similar argument, 
1 ,11 



TO log P TO 



< q 



l 

TO m log M TO 



where /3 = logg(g). 

Suppose that p = pi, so that p > p and q < q. Note 
that a is a decreasing function of p, because the base of the 
logarithm satisfies p < 1. Since log^(p) = 1, it follows that 
a = logp(p) < 1, and by a parallel argument, j3 > 1. Lemma 
[6] then implies that conditions ( 19 1 hold. Similarly, if p = po, 
so that p <p and q > q, then a > 1 and (3 < 1, and conditions 
( |20l > follow again from Lemma [6] ■ 

We are now ready to complete the proof, using a standard 
Borel-Cantelli argument. 

Proof of Theorem H\ Suppose that 8 = 1. Then, by 
Lemmata [7] and [8] we have that 



1,1 (wi = 1 | Wi = 0) = oo, 



while 



oo 

E 



P 1 (w i+ i=0|w i = l)<oo. 



Therefore, transitions from the state of the Markov chain 
{wi} to state 1 are guaranteed to happen, while transitions 
from state 1 to state will happen only finitely many times. 
It follows that Wi converges to 1, almost surely, when 6 = 1. 
By a symmetrical argument, u>j converges to 0, almost surely, 
when 9 = 0. 

Having proved (almost sure) convergence of the sequence 
{wi}i 6 N, it remains to prove convergence (in probability) of 
the sequence {x„}„ e N (of which {u>i}i S N is a subsequence). 



This is straightforward, and we only outline the argument. If 
Wi is the decision x n at the beginning of a segment, then x n = 
Wi for all n during that segment, unless a searching phase is 
initiated. A searching phase gets initiated with probability at 
most 1/to at the beginning of the S-block and with probability 
at most 1/to at the beginning of the R-block. Since these 
probabilities go to zero as to — > oo, it is not hard to show that 
x n converges in probability to the same limit as mj. 

■ 

The existence of a decision profile that guarantees learning 
in probability naturally leads to the question of providing 
incentives to agents to behave accordingly. It is known [?], [?], 
[?] that for Bayesian agents who minimize the probability of 
an erroneous decision, learning in probability does not occur, 
which brings up the question of designing a game whose 
equilibria have desirable learning properties. A natural choice 
for such a game is explored in the next section, although our 
results will turn out to be negative. 

VII. Forward looking agents 

In this section, we assign to each agent a payoff function that 
depends on its own decision as well as on future decisions. We 
consider the resulting game between the agents and study the 
learning properties of the equilibria of this game. In particular, 
we show that learning fails to obtain at any of these equilibria. 

A. Preliminaries and notation 

In order to conform to game-theoretic terminology, we will 
now talk about strategies a n (instead of decision rules d n ). A 
(pure) strategy for agent n is a mapping a n : {0, 1} K x 
S — > {0, 1} from the agent's information set (the vector 
v n = (%n-i, ■ ■ ■ > %n-ic) °f decisions of its K immediate 
predecessors and its private signal s n ) to a binary decision, 
so that x n = <7 n (v n , s n ). A strategy profile is a sequence 
of strategies, a = {er n }rteN- We use the standard notation 
<r_ n = {<7i, . . . , <7 n _i, OYi+i, . . .} to denote the collection of 
strategies of all agents other than n, so that a = {<j n ,<7_ n }. 
Given a strategy profile a, the resulting sequence of decisions 
{XnjngN is a well defined stochastic process. 

The payoff function of agent n is 



k—n 



(21) 



where S € (0, 1) is a discount factor, and 1^ denotes the 
indicator random variable of an event A. Consider some agent 
n and suppose that the strategy profile cr_ n of the remaining 
agents has been fixed. Suppose that agent n observes a particu- 
lar vector u of predecessor decisions (a realization of v„) and 
a realized value s of the private signal s n . Note that (v„, s n ) 
has a well defined distribution once <t_„ has been fixed, and 
can be used by agent n to construct a conditional distribution 
(a posterior belief) on 8. Agent n now considers the two 
alternative decisions, or 1. For any particular decision that 
agent n can make, the decisions of subsequent agents k will be 
fully determined by the recursion Xk = o-n(v/c,Sfc), and will 
also be well defined random variables. This means that the 



11 



conditional expectation of agent n's payoff, if agent n makes 
a specific decision y £ {0, 1}, 



U n {y;u,s) 
= E 



k=n+l 



U, S n 



is unambiguously defined, modulo the usual technical caveats 
associated with conditioning on zero probability events; in 
particular, the conditional expectation is uniquely defined for 
"almost all" (u, s), that is, modulo on a set of (v„, s n ) values 
that have zero probability measure under er_„. We can now 
define our notion of equilibrium, which requires that given the 
decision profile of the other agents, each agent maximizes its 
conditional expected payoff U n (y;u,s) over y £ {0,1}, for 
almost all (u, s). 

Definition 2. A strategy profile a is an equilibrium if for 

each n £ ~N, for each vector of observed actions u £ {0, 1} K 
that can be realized under a with positive probability (i.e., 
P(v n = u) > 0), and for almost all s £ S, a n maximizes the 
expected payoff of agent n given the strategies of the other 
agents, cr_„, i.e., 

<t„(u,s) £ argmaxC/„(y,u, s). 

?y6{0,l} 

Our main result follows. 

Theorem 4. For any discount factor 5 £ [0,1) and for any 
equilibrium strategy profile, learning fails to hold. 

We note that the set of equilibria, as per Definition [2] 
contains the Perfect Bayesian Equilibria, as defined in [?]. 
Therefore, Theorem |4] implies that there is no learning at any 
Perfect Bayesian Equilibrium. 

From now on, we assume that we fixed a specific strategy 
profile a. Our analysis centers around the case where an agent 
observes a sequence of ones from its immediate predecessors, 
that is, v„ = e, where e = (1,1,..., 1). The posterior 
probability that the state of the world is equal to 1, based 
on having observed a sequence of ones is defined by 



1 



e). 



Here, and in the sequel, we use P to indicate probabilities of 
various random variables under the distribution induced by a, 
and similarly for the conditional measures P-? given that the 
state of the world is j £ {0, 1}. For any private signal value 
s £ S, we also define 

/„(s)=P(0 = l| v n = e,s n = s). 

Note that these conditional probabilities are well defined as 
long as P(v„ = e) > and for almost all s. We also let 

/„ = essinfsgs/^s). 

Finally, for every agent n, we define the switching probability 
under the state of the world 9 = 1, by 

7n=F 1 K(e,s n ) = 0). 

We will prove our result by contradiction, and so we assume 



that a is an equilibrium that achieves learning in probability. 
In that case, under state of the world 9 = 1, all agents will 
eventually be choosing 1 with high probability. Therefore, 
when 9 = 1, blocks of size K with all agents choosing 1 
(i.e., with v„ = e) will also occur with high probability. The 
Bayes rule will then imply that the posterior probability that 
= 1, given that v n = e, will eventually be arbitrarily close 
to one. The above are formalized in the next Lemma. 

Lemma 9. Suppose that the strategy profile a leads to 
learning in probability. Then, 

(i) lim^oo P°(v„ = e) = and lim^^ P^v,, = e) = 1. 

(ii) lim rwoo 7r„ = 1, 

(Hi) lim„_ s . 00 / rl (s) = 1, uniformly over all s £ S, except 

possibly on a zero measure subset of S. 
(ivj lim„^ oc 7„ = 0. 

Proof: 

(i) Fix some e > 0. By the learning in probability assump- 
tion, 

lim P°(v n = e) < lim P°(x„ = 1) = 0. 

n— >oo n— foo 

Furthermore, there exists N £ N such that for all n > N, 

c 



P 1 (a; n = 0) < 
Using the union bound, we obtain 



A" 



P 1 (v n = e)>l- P 1 K = 0)>l-e, 

k=n-K 

for all n> N + K. Thus, lim^oo P x (v„ = e) > 1 - e. 
Since e is arbitrary, the result for P 1 (v„ = e) follows, 
(ii) Using the Bayes rule and the fact that the two values of 
9 are a priori equally likely, we have 

P 1 (v„=e) 



P°(v n = e)+Pi(v„ = e)- 

The result follows from part (i). 
(iii) Since the two states of the world are a priori equally 
likely, the ratio f n (s)/(l — f n (s)) of posterior probabil- 
ities, is equal to the likelihood ratio associated with the 
information v n = e and s n = s, i.e, 



fn(s) 



ol (v„-e) d¥ 1 



s), 



1 -/„(») P°(v„ = e) d¥ Q 

almost everywhere, where we have used the indepen- 
dence of v„ and s n under either state of the world. Using 
the BLR assumption, 

1 TOllV _ „\ 



Us) 



> 



00, 



l-f n (s) - M P°(v B = e)" 
almost everywhere. Hence, using the result in part (i), 

fn(s) 
fn(s) 

uniformly over all s £ S, except possibly over a count- 
able union of zero measure sets (one zero measure set for 
each n). It follows that lim n ^oo/ n (s) = 1, uniformly 
over s £ S, except possibly on a zero measure set. 



lim 

n— foo 1 
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(iv) We note that 

P 1 ^ = 0, v„ = e) = PVn = e) • 7n- 



bounded from above as follows: 



Since P 1 ^ = 0,v„ = e) < P 1 (x n = 0), we have 

Furthermore, from 
1. It follows that 



lirrin^oo P 1 (x n = 0, v n = e) = 0. Furthermore, from 



part (i), lim^oo P 1 (v„ = e 
lim^oo 7„ = 0. 



We now proceed to the main part of the proof. We will argue 
that under the learning assumption, and in the limit of large n, 
it is more profitable for agent n to choose 1 when observing 
a sequence of ones from its immediate predecessors, rather 
than choose 0, irrespective of its private signal. This implies 
that after some finite time N, the agents will be copying their 
predecessors' action, which is inconsistent with learning. 

Proof of Theorem^ Fix some e G (0, 1 — S). We define 



t n = supi t 



n+t 

E 

k—7i 



Ik < e 



}■ 



(Note that t n can be, in principle, infinite.) Since jk converges 
to zero (Lemma |9|iv)), it follows that hnin^^ t n = oo. 

Consider an agent n who observes v„ = e and s n — s, and 
who makes a decision x n — 1. (To simplify the presentation, 
we assume that s does not belong to any of the exceptional, 
zero measure sets involved in earlier statements.) The (condi- 
tional) probability that agents n + 1, . . . , n + t n all decide 1 
is 

Cn+t„ \ n+t„ 

pi K( Sfc ,e)=i} = n (i-^) 
k=n+l J k=n+l 

> 1 - 7fc > 1 - e- 

k=n+l 

With agent n choosing the decision x n = 1, its payoff can be 
lower bounded by considering only the payoff obtained when 
6 = 1 (which, given the information available to agent n, 
happens with probability f n (s)) and all agents up to n + t n 
make the same decision (no switching): 

U n (l;e,s)>f n (s)[ (1-e). 



Since f n (s) < 1 for all s E S, and 



k=n 



we obtain 



U n (l;e,s)>f n (s) ^ 



, k—n 



e 

1-5' 



Combining with part (iii) of Lemma [9] and the fact that 



linin^oo t n = oo, we obtain 



liminf U n (l; e, s) > 



(22) 



1-5 1-5' 
On the other hand, the payoff from deciding x n = can be 



U n (0;e,s) 

= E 



< 



le=o + E ^ k n ^-xk=8 

k=n+l 

= | v n = e, s n — s) + 



G. Sr, 



1-5 



= l-fn(s) + 



1-5' 

Therefore, using part (iii) of Lemma |9] 

lim sup U n (0; e, s) < 



Our choice of 5 implies that 
1 e 



> 



1-5' 
5 



(23) 



1-5 1-5 1-5 



Then, ( 22 1 and ( 23 i imply that there exists N € N such that 



for all n> N, 



U n (l;e,s) > U n (0;e,s). 



almost everywhere in S. Hence, by the equilibrium property 
of the strategy profile a n (e, s) = 1 for all n > N and for all 
s € S, except possibly on a zero measure set. 

Suppose that the state of the world is 6 = 1. Then, by 
part (i) of Lemma [9] v„ converges to e, in probability, and 
therefore it converges to e almost surely along a subsequence. 
In particular, the event {v n = e} happens infinitely often, 
almost surely. If that event happens and n > N, then every 
subsequent Xk will be equal to 1. Thus, x n converges almost 
surely to 1. By a symmetrical argument, if 9 = 0, then x n 
converges almost surely to 0. Therefore, x n converges almost 
surely to 0. This is impossible, by Theorem[T] We have reached 
a contradiction, thus establishing that learning in probability 
fails under the equilibrium strategy profile a. ■ 



VIII. Conclusions 

We have obtained sharp results on the fundamental lim- 
itations of learning by a sequence of agents who only get 
to observe the decisions of a fixed number K of immediate 
predecessors, under the assumption of Bounded Likelihood 
Ratios. Specifically, we have shown that almost sure learning 
is impossible whereas learning in probability is possible if and 
only if K > 1. We then studied the learning properties of the 
equilibria of a game where agents are forward looking, with 
a discount factor 5 applied to to future decisions. As 5 ranges 
in [0, 1) the resulting strategy profiles vary from the myopic 
{5 = 0) towards the case of fully aligned objectives (5 —> 1). 
Interestingly, under a full alignment of objectives and a central 
designer, learning is possible when K > 2, yet learning fails 
to obtain at any equilibrium of the associated game, and for 
any 5 € [0, 1). 

The scheme in Section [VI] is only of theoretical interest, 
because the rate at which the probability of error decays to 
zero is extremely slow. This is quite unavoidable, even for 
the much more favorable case of unbounded likelihood ratios 



[?], and we do not consider the problem of improving the 
convergence rate a promising one. 

The existence of a decision profile that guarantees learning 
in probability (when K > 2) naturally leads to the question 
of whether it is possible to provide incentives to the agents to 
behave accordingly. It is known [?], [?], [?] that for myopic 
Bayesian agents, learning in probability does not occur, which 
raises the question of designing a game whose equilibria have 
desirable learning properties. Another interesting direction is 
the characterization of the structural properties of decision 
profiles that allow or prevent learning whenever the latter is 
achievable. 

Finally, one may consider extensions to the case of m > 2 
hypotheses and m-valued decisions by the agents. Our neg- 
ative results are expected to hold, and the construction of a 
decision profile that learns when K > m, is also expected to 
go through, paralleling a similar extension in [?]. 



